Privacy Preserving DBSCAN Algorithm for Clustering

نویسندگان

  • K. Anil Kumar
  • C. Pandu Rangan
چکیده

In this paper we address the issue of privacy preserving clustering. Specially, we consider a scenario in which two parties owning confidential databases wish to run a clustering algorithm on the union of their databases, without revealing any unnecessary information. This problem is a specific example of secure multi-party computation and as such, can be solved using known generic protocols. However there are several clustering algorithms are available. They are applicable to specific type of data, but DBSCAN [4] is applicable for all types of data and the clusters obtained by DBSCAN are similar to natural clusters. However, DBSCAN [4] algorithm is basically designed as an algorithm working on a single database. In this paper we proposed a protocols for how the distances are measured between data points, when the data is distributed across two parties. By using these protocols we propose the first novel method for running DBSCAN algorithm operating over vertically and horizontally partitioned data sets, distributed in two different databases in a privacy preserving manner.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

بررسی مشکلات الگوریتم خوشه بندی DBSCAN و مروری بر بهبودهای ارائه‌شده برای آن

Clustering is an important knowledge discovery technique in the database. Density-based clustering algorithms are one of the main methods for clustering in data mining. These algorithms have some special features including being independent from the shape of the clusters, highly understandable and ease of use. DBSCAN is a base algorithm for density-based clustering algorithms. DBSCAN is able to...

متن کامل

Improvement of density-based clustering algorithm using modifying the density definitions and input parameter

Clustering is one of the main tasks in data mining, which means grouping similar samples. In general, there is a wide variety of clustering algorithms. One of these categories is density-based clustering. Various algorithms have been proposed for this method; one of the most widely used algorithms called DBSCAN. DBSCAN can identify clusters of different shapes in the dataset and automatically i...

متن کامل

Repeated Record Ordering for Constrained Size Clustering

One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...

متن کامل

Privacy Preserving Publication of Locations Based on Delaunay Triangulation

The pervasive usage of LBS (Location Based Services) has caused serious risk of personal privacy. In order to preserve the privacy of locations, only the anonymized or perturbated data are published. At the same time, the data mining results for the perturbated data should keep as close as possible to the data mining results for the original data. In this paper, we propose a novel perturbation ...

متن کامل

Privacy Preserving Clustering

The freedom and transparency of information flow on the Internet has heightened concerns of privacy. Given a set of data items, clustering algorithms group similar items together. Clustering has many applications, such as customerbehavior analysis, targeted marketing, forensics, and bioinformatics. In this paper, we present the design and analysis of a privacy-preserving k-means clustering algo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007